Scoring tests separately by seroperson · Pull Request #350 · entrius/gittensor

seroperson · 2026-04-07T00:23:23Z

Closes #339

This PR addresses the issue that test code affect scoring process.

Before

Tests + source + non-code changes are counted for contribution bonus and also counted in density calculation.
Adding tests significantly reduce your score, because they give almost no score, but still affect density.
Adding non-code changes also affect the source density and may lower the score. The situation is better than with tests, as recognizable file (via extension) at least contribute some score. But unrecognized files still always lower the score, as they give zero score and affect density.
Situation is the same with deleted and binary files: they contribute no score, but still affect the density.

After

Test + source + non-code changes has its' own density and calculate its' own score.
Only source changes are counted for contribution bonus.
After each category is scored calculated, they are summed.

Implementation details

Added ScoringCategory enum. Possible values are TEST (if it's a test file), SOURCE (if scoring method is tree_sitter and it's not a test file), NON_CODE (everything else: recognized non-code changes, unrecognized non-code changes, deleted files, binary files)
Added PrScoringResultCategorized, which holds PrScoringResult per each existing category.
Scoring logic moved from scoring.py into PrScoringResultCategorized methods.

Test cases

Test files:

test_adding_tests_does_not_reduce_score - Adding test files to a source PR never lowers the base score
test_tests_do_not_affect_contribution_bonus - Small and large test files produce modest, similar increases (test weight is 0.05x)
test_same_code_in_test_path_scores_much_lower - Identical code in a test directory scores 10x+ lower than in a source path
test_tests_do_not_affect_threshold - Test files can't push a below-threshold PR past the token score threshold

Non-code files:

test_adding_non_code_files_does_not_reduce_score - Adding non-code files (markdown, yaml) never lowers the base score
test_non_code_does_not_affect_contribution_bonus - Small and large non-code files produce the same density increase (no bonus impact)
test_source_code_scores_much_higher_than_non_code - Tree-diff scored source code scores 10x+ higher than line-count scored non-code files
test_non_code_does_not_affect_threshold - Non-code files can't push a below-threshold PR past the token score threshold

Zero-score files:

test_deleted_file_does_not_change_score - Deleted files (score=0) don't affect the base score
test_unsupported_file_does_not_change_score - Unsupported extensions (score=0) don't affect the base score

Density:

test_adding_test_category_increases_score_beyond_single_cap - Per-category density cap allows multiple categories to contribute independently
test_verbose_formatting_decreases_score - Same logic in more lines produces lower density and lower score
test_modified_file_scores_diff_only - Modified files score only the AST diff, not the entire file

Threshold:

test_below_threshold_scores_less - Trivial changes below token score threshold score less than substantial changes

…trius#317)

Co-authored-by: Ander <61125407+anderdc@users.noreply.github.com>

Co-authored-by: Zanie Blue <contact@zanie.dev>

…ate scoring (entrius#314) Co-authored-by: root <root@135-181-76-236.ptr> Co-authored-by: Ander <61125407+anderdc@users.noreply.github.com>

Co-authored-by: Ander <61125407+anderdc@users.noreply.github.com>

… for missing issues (entrius#335)

Co-authored-by: anderdc <me@alexanderdc.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

anderdc · 2026-04-07T01:29:29Z

just merged big changes into test please fix conflicts thanks

… 502 errors on large PRs (entrius#331)

Co-authored-by: Ander <61125407+anderdc@users.noreply.github.com>

…res (entrius#346) Co-authored-by: mkdev11 <MkDev11@users.noreply.github.com>

seroperson · 2026-04-09T07:42:37Z

@anderdc I've rebased it on current test 👍

…ount (entrius#340) Co-authored-by: Ander <61125407+anderdc@users.noreply.github.com>

…s#351) Co-authored-by: Ander <61125407+anderdc@users.noreply.github.com>

Co-authored-by: anderdc <me@alexanderdc.com>

anderdc · 2026-04-10T15:52:44Z

I have a PR going into test now that will fix that pyright type check CI, don't worry about it

anderdc

Review

Good problem identification — test and non-code files dragging down source scores via density is a real issue and the per-category approach is the right direction. The ScoringCategory enum and the category property on FileScoreResult are clean additions. The test suite is thorough and covers the right invariants.

However, the implementation adds more scaffolding than necessary and introduces a redundant iteration. Requesting changes on the following:

1. Don't loop twice over file results

from_file_results() re-iterates all file_results to categorize and re-sum totals that the existing loop in tree_sitter_scoring.py already computes. The category is trivially known at each append site (is_test_file + scoring_method), so just accumulate into per-category dicts inline during the existing loop. No second pass needed.

2. Extend `PrScoringResult` instead of adding `PrScoringResultCategorized`

The new wrapper class duplicates fields from PrScoringResult (total_score, total_nodes_scored, score_breakdown) and adds indirection. Since PrScoringResult is only constructed in one place and consumed in one place, just add by_category: Dict[ScoringCategory, PrScoringResult] and total_lines: int to the existing class. No need for a parallel class hierarchy or the _EMPTY_SCORING_RESULT sentinel.

3. Keep scoring logic in `scoring.py`

calculate_initial_base_score() and calculate_contribution_bonus() are business logic that reference scoring constants (MERGED_PR_BASE_SCORE, MAX_CONTRIBUTION_BONUS, CONTRIBUTION_SCORE_FOR_FULL_BONUS). This logic currently lives in scoring.py alongside the rest of the scoring decisions. Moving it into a dataclass in classes.py scatters scoring logic across two files. Keep it in scoring.py — the caller can iterate by_category directly with a few lines.

4. Threshold check may not isolate categories correctly

In calculate_initial_base_score(), the threshold uses self.score_breakdown.total_score — the aggregate across all categories including test and non-code. A PR with trivial source but a large test suite could pass the threshold via test score alone. The existing tests for this (test_tests_do_not_affect_threshold) pass token_score=0.0 explicitly on the factory, so they don't catch this path. The intent is that only SOURCE contributions matter for the threshold, the check should use the source category's score, not the aggregate.

5. Preserve diagnostic detail in log output

The current log shows density and bonus percentage:

Base score: 15.00 (density 0.50) + 3.0 bonus (10% of max 30) = 18.00

The PR reduces this to:

Base score: 15.00 + 3.0 bonus = 18.00

With per-category scoring, the density breakdown is actually more useful to log, not less. Please preserve or improve the diagnostic output.

anderdc and others added 14 commits March 20, 2026 14:34

improved error logging and file extension extraction (entrius#312)

d4815ff

add entrius/allways and entrius/allways-ui to master repositories (en…

c790e93

…trius#317)

Remove nushell/nushell from whitelist (entrius#316)

8e94f01

Co-authored-by: Ander <61125407+anderdc@users.noreply.github.com>

chore: add additional acceptable branches to traefik repos (entrius#315)

c70a24c

Co-authored-by: Ander <61125407+anderdc@users.noreply.github.com>

fix: liveweb-arena weight (entrius#320)

629cb66

Remove astral-sh/ruff from whitelist (entrius#322)

b920647

Co-authored-by: Ander <61125407+anderdc@users.noreply.github.com>

address requested removals (entrius#325)

21b416c

Co-authored-by: Zanie Blue <contact@zanie.dev>

fix: detect inline tests in languages like Rust, Zig, and D for accur…

8a74f90

…ate scoring (entrius#314) Co-authored-by: root <root@135-181-76-236.ptr> Co-authored-by: Ander <61125407+anderdc@users.noreply.github.com>

chore: update transferred GitHub repo names (entrius#332)

08732e5

Update the agcli repo name which was moved to unarbos (entrius#333)

63582be

Remove astropy/astropy (entrius#334)

087e6e2

Co-authored-by: Ander <61125407+anderdc@users.noreply.github.com>

address removals (entrius#337)

d4ae0aa

fix: issues list --json --id returns null instead of structured error…

ad759cd

… for missing issues (entrius#335)

Revamp (entrius#348)

ec3a822

Co-authored-by: anderdc <me@alexanderdc.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

bittoby and others added 4 commits April 6, 2026 22:39

fix: split GraphQL file content fetches into batches of 50 to prevent…

9004ba9

… 502 errors on large PRs (entrius#331)

Deactivate autoppia repos (deregistered) (entrius#347)

29d55a3

Co-authored-by: Ander <61125407+anderdc@users.noreply.github.com>

Improve CLI help output and shared option decorators (entrius#341)

04f1300

Co-authored-by: Ander <61125407+anderdc@users.noreply.github.com>

fix: use merge-base for tree-diff scoring to avoid inflated token sco…

ae64b04

…res (entrius#346) Co-authored-by: mkdev11 <MkDev11@users.noreply.github.com>

seroperson force-pushed the i339-tests-separate-scoring branch 2 times, most recently from d9ff4cd to d8a9df5 Compare April 7, 2026 08:50

tmimmanuel and others added 4 commits April 9, 2026 10:15

fix: clean up stale miner data when hotkey re-links to new GitHub acc…

cbb4956

…ount (entrius#340) Co-authored-by: Ander <61125407+anderdc@users.noreply.github.com>

chore: accept async-substrate-interface PRs merged to staging (entriu…

a54c755

…s#351) Co-authored-by: Ander <61125407+anderdc@users.noreply.github.com>

Issue discovery (entrius#356)

95d7d05

Co-authored-by: anderdc <me@alexanderdc.com>

Scoring tests separately

4ade5b6

seroperson force-pushed the i339-tests-separate-scoring branch from d8a9df5 to 4ade5b6 Compare April 9, 2026 23:23

anderdc added 2 commits April 10, 2026 11:09

Remove stale miner_tier_stats reference (entrius#357)

6ca2c5b

Merge branch 'test' into i339-tests-separate-scoring

639d9d5

anderdc requested changes Apr 10, 2026

View reviewed changes

anderdc force-pushed the test branch 2 times, most recently from 8adcdf6 to 9c9860f Compare April 10, 2026 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scoring tests separately#350

Scoring tests separately#350
seroperson wants to merge 24 commits intoentrius:testfrom
seroperson:i339-tests-separate-scoring

seroperson commented Apr 7, 2026

Uh oh!

anderdc commented Apr 7, 2026

Uh oh!

seroperson commented Apr 9, 2026

Uh oh!

anderdc commented Apr 10, 2026

Uh oh!

anderdc left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Conversation

seroperson commented Apr 7, 2026

Before

After

Implementation details

Test cases

Uh oh!

anderdc commented Apr 7, 2026

Uh oh!

seroperson commented Apr 9, 2026

Uh oh!

anderdc commented Apr 10, 2026

Uh oh!

anderdc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review

1. Don't loop twice over file results

2. Extend PrScoringResult instead of adding PrScoringResultCategorized

3. Keep scoring logic in scoring.py

4. Threshold check may not isolate categories correctly

5. Preserve diagnostic detail in log output

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

anderdc left a comment •

edited

Loading

2. Extend `PrScoringResult` instead of adding `PrScoringResultCategorized`

3. Keep scoring logic in `scoring.py`